Speech acts and dialog TTS
نویسندگان
چکیده
The approach outlined in this paper aims to provide better expressivity of unit selection TTS for dialog intended applications while retaining the natural sounding voice quality typical of unit selection synthesis. A small set of speech acts were used to annotate a corpus from one female US English speaker. The corpus was composed of speech read primarily from interactive dialogs of various kinds. Global acoustic variables related to prosody were calculated for each speech act in the corpus. A hierarchical cluster analysis performed on the acoustic variables showed clustering that corresponded to general classes of dialog speech acts. The acoustic prosodic variables were used to specify pitch range parameters of a unit selection Speech Act TTS voice. Listening tests indicated large and significant improvement in rated speech quality for the Speech Act system compared to the Standard TTS system built from the same speaker.
منابع مشابه
Dialog speech acts and prosody: Considerations for TTS
As natural language dialog systems involving both speech recognition and text-to-speech (TTS) synthesis become more sophisticated, the limitations of general-purpose TTS for human-computer dialogs have become more apparent. Much subtlety and complexity of meaning in natural language dialogs is conveyed by prosody; how something is said is often as important as what words are spoken. At the same...
متن کاملEnriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags
We present an approach for enriching dialog based textto-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting autom...
متن کاملModeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis
This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...
متن کاملAnalysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions
This paper reports an analysis on effect of text-to-speech (TTS) and avatar agent in evoking user’s user’s spontaneous backchannels. We construct an HMMbased dialogue-style TTS system that generates human-like cues that evoke users’ backchannels. We also constructed an avatar agent that can make several listener’s reactions. A spoken dialogue system for information navigation was implemented an...
متن کاملEvolution of Text-to-Speech Systems and Methods of Their Assessment
The paper gives a retrospective of the development of speech synthesis systems, from mechanical synthesisers to computer systems for text-to-speech conversion (TTS) and analyses the perspectives of biomechanical and multimodal TTS systems within dialogue systems addressing higher cognitive levels as well. Special attention is given to the methods for assessment of the quality of synthesised spe...
متن کامل